A Mathematical View of Latent Semantic Indexing: Tracing Term Co-occurrences

نویسندگان

  • April Kontostathis
  • William M. Pottenger
چکیده

Current research in Latent Semantic Indexing (LSI) shows improvements in performance for a wide variety of information retrieval systems. We propose the development of a theoretical foundation for understanding the values produced in the reduced form of the term-term matrix. We assert that LSI’s use of higher orders of co-occurrence is a critical component of this study. In this work we present experiments that precisely determine the degree of co-occurrence used in LSI. We empirically demonstrate that LSI uses up to fifth order term co-occurrence. We also prove mathematically that a connectivity path exists for every nonzero element in the truncated term-term matrix computed by LSI. A complete understanding of this term transitivity is key to understanding LSI.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A framework for understanding Latent Semantic Indexing (LSI) performance

In this paper we present a theoretical model for understanding the performance of Latent Semantic Indexing (LSI) search and retrieval applications. Many models for understanding LSI have been proposed. Ours is the first to study the values produced by LSI in the term dimension vectors. The framework presented here is based on term co-occurrence data. We show a strong correlation between second ...

متن کامل

Term Representation with Generalized Latent Semantic Analysis

Document indexing and representation of termdocument relations are very important issues for document clustering and retrieval. In this paper, we present Generalized Latent Semantic Analysis as a framework for computing semantically motivated term and document vectors. Our focus on term vectors is motivated by the recent success of co-occurrence based measures of semantic similarity obtained fr...

متن کامل

Multi-view learning via probabilistic latent semantic analysis

Multi-view learning arouses vast amount of interest in the past decades with numerous real-world applications in web page analysis, bioinformatics, image processing and so on. Unlike the most previous works following the idea of co-training, in this paper we propose a new generative model for Multi-view Learning via Probabilistic Latent Semantic Analysis, called MVPLSA. In this model, we jointl...

متن کامل

Memory-restricted latent semantic analysis to accumulate term-document co-occurrence events

0167-8655/$ see front matter 2012 Elsevier B.V. A http://dx.doi.org/10.1016/j.patrec.2012.05.002 ⇑ Corresponding author. E-mail addresses: [email protected] (S.-H (J.-H. Lee). This paper addresses a novel adaptive problem of obtaining a new type of term-document weight. In our problem, an input is given by a long sequence of co-occurrence events between terms and documents, namely, a stream ...

متن کامل

Latent Semantic Kernels for Feature Selection Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150

Latent Semantic Indexing is a method for selecting informative subspaces of feature spaces. It was developed for information retrieval to reveal semantic information from document co-occurrences. The paper demonstrates how this method can be implemented implicitly to a kernel deened feature space and hence adapted for application to any kernel based learning algorithm and data. Experiments with...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002